Skip to main content

data analysis

1. Description

data_analysis is a command-line tool for performing data quality checks, calculations, search operations, transformations, and file operations on text-based data files. It runs independently from any directory with no external dependencies.

Typical run flow: Choose run mode → Choose file → Enter header/footer rows → Choose category → Choose activity → Enter inputs → Confirm → Run → Review result

2. Starting the Tool

starting_the_tool

The data_analysis tool can be run from any directory without any dependency. The username is expected for logging and querying purposes.

3. Run Mode Selection

run_mode_selection

Select how much system resources the tool should use during processing. Choose based on how fast you need results versus how much load you want on the server.

#OptionDescription
1Rabbit ModeUses maximum resources available in the system. Best when you want faster processing.
2Tortoise ModeUses minimum basic required resources. Best when you want lower system usage.

4. File Selection

file_selection

Choose the input file you want to work on. You can type the path directly or use the file picker suggestions. There is no limitation to absolute or relative path.

5. Delimiter Detection

delimiter_detection

The delimiter of the txt file is automatically detected. If auto-detection fails, you will be prompted to manually select the delimiter from the options shown below.

delimiter_manual

header_footer_counts

Tell the tool how many rows at the top and bottom of the file are not actual data. This ensures those rows are excluded from all calculations and operations.

#OptionDescription
1Header rowsRows at the top of the file that should not be treated as normal data.
2Footer rowsRows at the bottom of the file that should not be treated as normal data.

Example: If the first row contains column names and the last 2 rows contain summary text, enter 1, 2. If there are no header and no footer rows, enter 0, 0 or just press Enter.

7. Main Menu

main_menu

For each menu category, all available sub-options are displayed. The menu is categorized based on similar operations that can be applied to the file.

7.1 Navigation Keys

#OptionDescription
1Up / DownNavigate between options or select an option.
2Enter / ALT + EnterConfirm column selection or file selection.
3EnterConfirm multi-column selection.
4EscMove one step backwards.
5Backspace / DeleteClear or remove input.

7.2 Writing Output

Many calculations or validations can be written to an output file by selecting the ✎ Write to a file option and selecting a file path.

7.3 Column Selection Behaviour

  • If a header is passed and is correct, columns can be fetched via dropdown.

column_selection_dropdown

  • If a header is passed but data is wrong, or if no header is passed, column numbers are expected from the user.

column_selection_number

8. Data Quality

8.1 Column Validation

column_validation_menu

column_validation_result

Use this to check whether values in selected columns follow expected rules. Each column can be validated against one or more rule types, and any failing rows are reported.

Available validation types:

  • date
  • format
  • allowed values list
  • exact length
  • maximum length
  • minimum length
  • must not be empty
  • must be empty
  • valid number
  • sign check
  • numeric range
  • regex pattern
  • no extra whitespace

8.2 Column Count Check

column_count_check

Use this to check whether every row in the file has the expected number of columns. Useful for catching malformed rows where delimiters are missing or extra.

#OptionDescription
1ExactRow must have exactly the expected number of columns.
2Less thanFlags rows that have fewer columns than expected.
3More thanFlags rows that have more columns than expected.

column_count_result

8.3 Duplicate

duplicate

Use this to detect duplicate rows based on one or more key columns. Provides four options depending on whether you want to view, count, remove, or extract the duplicates.

#OptionDescription
1Find duplicatesDisplays all rows that are duplicated based on the selected key columns.
2Count duplicatesReturns the total count of duplicate rows found in the file.
3Remove duplicatesRemoves all duplicate rows and writes the distinct records to an output file, keeping the first occurrence of each.
4Write to a fileWrites only the duplicate records to a separate output file for review.

duplicate_result

8.4 File Profile

file_profile

Use this to get a column-level summary of the entire input file. Helps you understand the shape and quality of your data before running any operations on it.

Information shown per column:

  • Blanks
  • Min
  • Max
  • Average
  • Distinct values
  • Detected type

8.5 Cross-Column Comparison

cross_column_comparison

Use this to compare values across two numeric columns within the same file. Useful for validating relationships between columns, such as ensuring one value is always greater than another.

Operators available:

  • Greater than
  • Less than
  • Equal to
  • Greater than or equal to
  • Less than or equal to

9. Calculation

9.1 Sum

sum

Use this to total one or more numeric columns across all rows or a filtered subset. You can narrow down which rows to include using column-value filters before summing.

#OptionDescription
1Optional filterChoose column(s) and values to include only matching rows. Conditions: equal to, not equal to.
2Sign optionsControl which values to include: all values, positive only, negative only, or absolute values.

9.2 Group By

group_by

Use this to group rows by one or more columns and calculate aggregated results per group. The output can be reviewed on screen or written to a file.

Available calculations per group:

  • Sum
  • Average
  • Weighted average
  • Count
  • Maximum
  • Minimum
  • Count distinct

9.3 Reconciliation

reconciliation

Use this to reconcile data between two files. Select a recon type to compare either the total sum of a column or the total row count across File A and File B.

#OptionDescription
1SumCompares the total of selected columns between File A and File B and reports any difference.
2Row countCompares the total number of rows in File A versus File B.

10. Search & Filter

find_search

Use this to find rows based on a value in a selected column. Choose a match mode to control how the search is applied, then view results on screen or write them to a file.

#OptionDescription
1Exact matchRow must match the search value exactly.
2ContainsRow contains the search value anywhere within it.
3Starts withRow value begins with the search string.
4Ends withRow value ends with the search string.
5RegexRow matches the provided regular expression pattern.
6Does not matchRow does not match the provided value.

10.2 Fetch by Date Range

fetch_by_date_range

Use this to extract rows where a date column falls within a required range. You must provide the date format used in the file so the tool can parse it correctly.

#OptionDescription
1BetweenReturns rows where the date falls between two specified dates (inclusive).
2Greater thanReturns rows where the date is after the specified date.
3Less thanReturns rows where the date is before the specified date.

11. Transformation

11.1 Add Column

add_column

Use this to add a new column to the file. Choose the type of value the new column should contain, and the tool appends it to every row in the output.

#OptionDescription
1Constant valueEvery row gets the same fixed value in the new column.
2Formula / expressionThe new column value is computed from an expression using existing columns.
3Row numberThe new column contains the sequential row index.
4Current dateThe new column is populated with today's date.

11.2 Find & Replace

find_and_replace

Use this to replace values in selected columns or across all columns. Choose a match mode to control how values are matched before replacement is applied.

Match modes:

  • Exact — replaces only rows where the value matches exactly.
  • Contains — replaces in any row where the value appears within the cell.
  • Regex — replaces values matching the provided regular expression.

11.3 Remove Rows

remove_rows

Use this to remove unwanted rows from the file before further processing. Select the mode that matches which rows you want to drop.

#OptionDescription
1HeaderRemoves the first row of the file.
2FooterRemoves the last row of the file.
3First N rowsRemoves the first N rows as specified by the user.
4Row rangeRemoves rows within a specified start and end range.
5Blank rowsRemoves all rows that are completely empty.

11.4 Sort

sort

Use this to sort the file by one or more columns. For each selected column, choose ascending or descending order. Multi-column sort applies the order in the sequence you select them.

11.5 Trim / Normalize

trim_normalize

Use this to strip leading and trailing spaces from values in the file. Can be applied to specific columns or to all columns at once to clean up inconsistent spacing.

11.6 Column Select

column_select

Use this to keep only the columns you need in the output file. Columns are written in the exact order you select them, allowing you to reorder as well as filter.

12. File Operations

12.1 Merge

merge

Use this to combine two files into one. Provide key columns from both files for join-based merges. The merged result is written directly to an output file.

#OptionDescription
1Inner joinKeeps only rows where the key exists in both File A and File B.
2Left joinKeeps all rows from File A; matches from File B where available.
3Right joinKeeps all rows from File B; matches from File A where available.
4Full outer joinKeeps all rows from both files, matched where keys align.
5Append (vertical stack)Stacks File B rows directly below File A rows without any key matching.

12.2 Lookup

lookup

Use this to enrich the main file by appending columns from a reference file. Match is done on key columns from both files, and the selected reference value columns are appended to each matched row.

12.3 Unique

unique

Use this to extract unique values based on selected columns. Useful for quickly seeing what distinct values exist in a column and optionally how many times each appears.

#OptionDescription
1Values onlyLists distinct values without occurrence counts.
2Values with countsLists distinct values along with how many times each appears in the file.

12.4 Split

split

Use this to break one file into multiple output files. When splitting by column value, a separate file is created for each unique value in that column, named as output_filename_columnvalue.txt.

#OptionDescription
1Split by column valueCreates one output file per unique value in the selected column. Output files are named as outputfilename\<column_value>.txt.
2Split by row countSplits the file into chunks where each output file contains N rows as specified by the user.

12.5 File Diff

file_diff

Use this to compare two files and identify what is different between them. The key column(s) must exist in both files and should represent the same kind of record, such as an ID, email, or order number.

#OptionDescription
1Key onlyChecks which records exist in one file but are missing in the other, based on the key column(s) alone. Does not compare values.
2Key + value comparisonMatches records by key column(s), then compares the value column(s) to find rows where the key exists in both files but the data differs.